Blind seperation of signals

Kort kommentar til NAV IT

Dette er et prosjekt fra faget vitenskapelige beregninger våren 2019. Oppgaven besto i å lage et program som kan ta inn miksede lydfiler (for eksempel fra mange mikrofoner i et rom) og separere de ulike lydene ved å gjøre dem minst mulig gaussiske. Tanken bak dette prosjektet var å gi en liten introduksjon til et relevant område innenfor beregningsvitenskap som har forgreninger til neurale nett og maskinlæring. Vi valgte her å gjøre litt mer enn nødvendig og mixet egne lydklipp som vi separerte.

Introduction

With applications in computational science in mind, this project is an attempt to solve the coctail-party problem. The crux of the problem is to extract information from different sources. There are different microphones in the room picking up the audio from the room. The source files are therefore filled with mixed signals that need to be seperated. The algorithm in this project is merely a use of blind seperation on audio files, and is higly applicable for other types of signal seperation.

In this project $d$ '.wav' audio files, each consisting of a different source consisting of $d$ mixed sound tracks, are run through a blind separation of signals to split them up into the separate sound tracks. To do this it is assumed that the mixed sound files are more gaussian than each track on its own. First the signals are centered and whitened before they and an arbitrary normalized $d\times d$-vector are used to iteratively calculate the $d\times d$-vector that splits the mixed files into tracks. More specifically, a kurtosis- and a negentropy-function are used in this step.

In addition to the three mixed sample soundtracks provided, we also used the provided code to mix an arbitrarily amount of different audio files of our own choosing and then applied the algorithm to seperate them from each other.

Theory

Whitening

A whitening transformation is a linear transformation that transforms a vector of random variables that has a known covariance matrix into a new vector that has the identity matrix as its covariance matrix, meaning that they are uncorrelated and each have variance 1.

Kurtosis and negentropy

The approximation for the kurtosis- and negentropy-functions that is used in the project is respectively, \begin{equation} g_{1}(u) = 4u^{3} \text{ } \text{ } \text{ } \text{and} \text{ } \text{ } \text{ } g_{2}(u) = u e^{-\frac{u^{2}}{w}}. \end{equation}

Preparations

In [1]:
"Import different audio files to be mixed and tested."
import numpy as np
from wav_file_loader import read_wavefiles


paths = ['audio/mix_1.wav', 'audio/mix_2.wav', 'audio/mix_3.wav']
data, sampling_rate = read_wavefiles(paths)
num_signals = data.shape[0]


paths_own = ['audio/MLK.wav', 'audio/Piano.wav', 'audio/RS.wav','audio/Jesus.wav','audio/Bohemean.wav']
data_own, sampling_rate_own = read_wavefiles(paths_own)
num_signals_own = data_own.shape[0] 
In [17]:
"This function normalizes the audio from the different sources, so that the audio files are of similar intensity."
def normalize_audio(data,d):
    """Scale amplitude s.t. max(data[i]) == 1."""
    abs_data = np.absolute(data)
    maximums = np.amax(abs_data,1)
    # Divide each row by a different vector element:
    data = data / maximums.reshape((d,1))
    return data

data = normalize_audio(data,num_signals)
data_own = normalize_audio(data_own,num_signals_own)
In [3]:
"Play the different input files."
import IPython.display as ipd
print('Provided files:')

ipd.display(ipd.Audio(data[0,:], rate=sampling_rate))
ipd.display(ipd.Audio(data[1,:], rate=sampling_rate))
ipd.display(ipd.Audio(data[2,:], rate=sampling_rate))
Provided files:
In [4]:
print('Our own un-mixed files:')
for i in range(0,len(data_own)):
    ipd.display(ipd.Audio(data_own[i,:], rate=sampling_rate_own))
Our own un-mixed files:

Mixing

In [5]:
def normalize_rowsums(A,d):
    """Divide each row in A by its sum.
    
    The sum of each row in the result is 1.0."""
    the_sum = np.sum(A,1)
    A = A / the_sum.reshape((d,1))
    return A

def random_mixing_matrix(signals, observations):
    """ Creates a random matrix
    
    Each element is a small positive number, not too close to 0.
    (1/11, 5/7).
    """
    A = 0.25 + np.random.rand(observations, signals)
    return normalize_rowsums(A, signals)
In [6]:
A = random_mixing_matrix(num_signals, num_signals)
A_own = random_mixing_matrix(num_signals_own, num_signals_own)

data_mixed = normalize_audio(A @ data,num_signals)
data_mixed_own = normalize_audio(A_own @ data_own,num_signals_own)
In [19]:
import IPython.display as ipd
print('Provided mixed files:')

ipd.display(ipd.Audio(data_mixed[0,:], rate=sampling_rate))
ipd.display(ipd.Audio(data_mixed[1,:], rate=sampling_rate))
ipd.display(ipd.Audio(data_mixed[2,:], rate=sampling_rate))
Provided mixed files:

Kommentar til NAV IT

Her kan man høre hvordan de egenmiksede filene høres ut før de blir separert. De separerte filene finner man helt til slutt.

In [8]:
print('Our own mixed files:')

for i in range(0,len(data_mixed_own)): 
    ipd.display(ipd.Audio(data_mixed_own[i,:], rate=sampling_rate_own))
Our own mixed files:

Preprocessing

In [9]:
def center_rows(Z):
    """Ensures each row has zero mean.
    
    Input: A two-dimensional matrix of arbitrary shape.
    Output: A centered version of the matrix where each row has zero mean.
    """
    
    Zc = Z
    counter = 0
    for row in Z:
        mean = np.sum(row) / len(row)
        Z[counter] -= mean
        counter += 1
    
    return Zc


def whiten_rows(Z):
    """Whitens matrix
    
    Input: A two-dimensional matrix of arbitrary shape
    Output: A whitened version of the matrix (a new matrix whose covariance is the identity matrix, 
            meaning that they are uncorrelated and each have variance 1).
            The matrix for the transform, in this case the T is the matrix that gives Zw = T @ Z.
    """

    C = np.cov(Z)
    U, S, _ = np.linalg.svd(C, full_matrices=False) # U is here the basis matrix and S is the block matrix
    T  = U @ np.diag(1 / np.sqrt(S)) @ U.T # Matrix T is here equal to (W @ W^T)^{-1/2}
    
    Zw = T @ Z
    
    return Zw, T

Main iteration - Maximizing non-gaussian-ness

In [10]:
def normalize_rownorms(Z):
    """Divides each row in Z by its Euclidean norm.
    
    Input: A two-dimensional matrix of arbitrary shape.
    Output: A normalized matrix, meaning that the norm of each row is equal to one.
    """
    
    Zn = np.array(Z, copy=True)
    counter = 0
    for row in Z:
        euclidian = np.sqrt(np.sum(row**2)) # Calculates the euclidian norm for the row
        Zn[counter] = Zn[counter] / euclidian
        counter += 1
    
    return Zn
In [11]:
def decorrelate_weights(W):
    """Orthogonalizes the matrix, removes the correlations
    
    Input: A dxd matrix W.
    Output: An orthogonal matrix by the transformation Wd = (W @ W^T)^{-1/2} @ W.
    """
    
    W_temp = W @ W.T
    U, S, _ = np.linalg.svd(W_temp, full_matrices=False) # U is here the basis matrix and S is the block matrix
    T = U @ np.diag(1 / np.sqrt(S)) @ U.T # Matrix T is here equal to (W @ W^T)^{-1/2}.
    
    return T @ W
In [12]:
def g_k(u):
    """Kurtosis-function."""
    return 4 * u**3

def g_k_derivative(u):
    """The derivative of the Kurtosis-function."""
    return 12 * u**2

def g_n(u):
    """Negentropy-function."""
    return u * np.exp(-u**2 / 2)

def g_n_derivative(u):
    """The derivative of the Negentropy-function."""
    return (1 - u**2) * np.exp(-u**2 / 2)

def update_W(W, Zcw, f, f_derivative):
    """Calculates W_k+1 from W_k.
    
    Input: A dxd matrix W = W_k.
           The centered, whitened data Zcw which is the dxN matrix known as tilde{x} in the note.
    Output: The new W, W_{k+1}.
    """

    S = W @ Zcw # 
    
    G = f(S) # Applies the kurtosis- or the negentropy-function on each element in S
    G_derivative = f_derivative(S) # Applies the derivative of the kurtosis- or the negentropy-function on each element in S
    
    E_G_derivative = np.array([])
    for row in G_derivative:
        E_G_derivative = np.append(E_G_derivative, [np.sum(row) / len(row)]) # Vector containing the average value of each row 
                                                                             # in G_derivative.
    
    diag_E_G_derivative = np.diag(E_G_derivative) # Diagonalizes the matrix E_G_derivative
            
    W_plus = ((1 / len(G[0])) * (G @ Zcw.T)) - (diag_E_G_derivative @ W) # Calculates W_{plus} with the method from the notes.
    
    return decorrelate_weights(normalize_rownorms(W_plus)) # Normalizes and then decorrelates W_{plus}
    
In [13]:
def measure_of_convergence(W1, W2):
    """This function computes an error estimate for the maximisation iteration, it computes the convergence
    criterion given in the note. 
    
    Input: W1 is the previous iterate, and W2 is the one just computed.
    Output: The quantity delta defined in the note.
    """

    delta_list = np.array([])
    
    Ws_multiplied = W1 * W2 # W1_{i, j} * W2_{i, j} for all elements.
    for row in Ws_multiplied:
        delta_list = np.append(delta_list, [1 - np.absolute(np.sum(row))]) # Calculates the value delta for each row.
    
    return np.amax(delta_list) # Returns the largest value delta.
     
In [14]:
import warnings


def fast_ICA(Z, signals_to_find, f, f_derivative, tol=1e-10, max_iter=200):
    """ This is the function that organises all the work.
    
    Input: Z is the unprocessed data.
           signals_to_find is the number of sources as well as the number of signals.
           f is the kurtosis or negentropy function.
           f_derivative is the derivation of f.
           tol is the tolerance, default value 1.0e-10.
           max_iter is the max number of iterations in the while loop, default value 200.
    Output: Z_ica is the the separated signals as a dxN matrix approximating the sources.
            W The final converged dxd W-matrix.
            number_of_iterations is the amount of iterations the while loop goes through.
    """
    
    Zc = center_rows(Z) # Centers the rows.
    Zcw, T = whiten_rows(Zc) # Whitens the rows.
    W_k = np.random.rand(signals_to_find, signals_to_find) # Creates a dxd matrix with random values between 0 and 1.
    W_k = normalize_rownorms(W_k) # Normalizes W_k.
    delta = 1
    number_of_iterations = 0
    
    while delta > tol and number_of_iterations < max_iter:
        W_k1 = update_W(W_k, Zcw, f, f_derivative) # Calculates W_{k+1}.
        delta = measure_of_convergence(W_k, W_k1) # Calculates delta.
        W_k = np.array(W_k1, copy=True)
        number_of_iterations += 1
    
    Z_ica = W_k @ Zcw # Separates the audios.
    
    return Z_ica, W_k, number_of_iterations

print("Standard data analysed with the Kurtosis function.")
    
result, W_K, number_of_iterations = fast_ICA(data, num_signals, g_k, g_k_derivative)

print("Number of iterations: ", number_of_iterations)

ipd.display(ipd.Audio(result[0,:], rate=sampling_rate))
ipd.display(ipd.Audio(result[1,:], rate=sampling_rate))
ipd.display(ipd.Audio(result[2,:], rate=sampling_rate))
Standard data analysed with the Kurtosis function.
Number of iterations:  11
In [21]:
print("Standard data analysed with the Negentropy function.")

result2, W_K, number_of_iterations = fast_ICA(data, num_signals, g_n, g_n_derivative)

print("Number of iterations: ", number_of_iterations)

ipd.display(ipd.Audio(result2[0,:], rate=sampling_rate))
ipd.display(ipd.Audio(result2[1,:], rate=sampling_rate))
ipd.display(ipd.Audio(result2[2,:], rate=sampling_rate))
Standard data analysed with the Negentropy function.
Number of iterations:  8

Kommentar til NAV

Her kan dere høre de separerte filene igjen.

In [16]:
print("Our own data analysed with the Negentropy function.")

result2_own, W_K, number_of_iterations = fast_ICA(data_own, num_signals_own, g_n, g_n_derivative, tol=1e-30)

print("Number of iterations: ", number_of_iterations)

for i in range(0,len(result2_own)):
    ipd.display(ipd.Audio(result2_own[i,:], rate=sampling_rate_own))
Our own data analysed with the Negentropy function.
Number of iterations:  14

Conclusion

Both the separation matrix made with the kurtosis function and the one made with the negentropy function managed to split the provided soundtracks. There were some noise, but it was easy to make out what each person said and the music sounded clear. There was no noticeable noise in the sound clip with music. One possible reason for this difference in quality could be that for people talking there are more moments of complete silence than it usually is in music.

The code works with all randomized and then normalized matrixes $W_{0}$, with the requirement that the matrix is not a null matrix or other specific matrixes where it cannot converge. The only difference is the number of iterations it takes to get a result within the tolerance criteria. For the kurtosis function the number of iterations is usually between 8 and 10 with tolerance $tol = 10^{-10}$, while Negentropy is normally between 6 and 8 for the same tolerance. There is not a significant difference, but Negentropy is usually faster.

As a measure of non-Gaussian-ness, neither Negentropy nor Kurtosis provided any noticable audible difference. This applied for increased tolerance as well. Increasing the tolerance from $tol = 10^{-10}$ to $tol = 10^{-30}$ didn't noticably reduce the whitenoise in each seperated file.

When multiple tracks containing large portions of silence were used, the mixed file was unsuccessfully split. This makes sense considering the notable similarities between portions of silence in tracks. In addition to this, if the difference in sound intensity between the files were large, the tracks with significantly higher intensity would bleed into the other seperated tracks. This despite the use of the normalize_audio function. If you want to test this, insert the Violin.wav file provided.

Upon using the provided code to mix the different files, a few limitations were discovered. The trivial limitation is naturally that the sample sizes of the tracks needed to be equal. A second limitation of the program is that the file sizes needed to be within a certain size. If this criterium was not met, an error message regarding the datalimit of jupiter notebook would appear. The workaround to this issue is to manually change the datalimit of notebook. An attempt to seperate three 10 second long files from each other resulted in a datalimt error. An additional requirement was that the files needed to be of mono channel.

In conclusion the seperation was successful to a variying degree, however this certain implementation of the algoritm is limited in scope as dicussed previously in the conclusion.